Skip to content

Conversation

@hasnatelias
Copy link

  • Removed dependency on Azure Translation API and Google Translate API.
  • Integrated M2M100ForConditionalGeneration and M2M100Tokenizer for translation.
  • Added language detection using langdetect library.
  • Updated the translate method to handle text translation and logging.
  • Improved error handling and fallback mechanism.

This implementation is sufficient for Japanese translation. Here's why:

sequenceDiagram
    participant Client
    participant router_py as router.py
    participant service_py as service.py
    participant translate_py as translate.py (ModelBasedTranslate)
    participant langdetect
    participant transformers as transformers (M2M100 Model)

    Client->>router_py: POST /rai/v1/moderations (Japanese Prompt)
    activate router_py

    router_py->>service_py: getModerationResult(payload)
    activate service_py

    alt Language is not English
        service_py->>translate_py: translator.translate(prompt)
        activate translate_py

        translate_py->>langdetect: detect(Japanese Prompt)
        activate langdetect
        langdetect-->>translate_py: returns 'ja'
        deactivate langdetect

        translate_py->>transformers: Set source lang to 'ja'
        translate_py->>transformers: Generate translation for 'en'
        activate transformers
        transformers-->>translate_py: returns English text
        deactivate transformers

        translate_py-->>service_py: "Translated English Text", "ja"
        deactivate translate_py
    end

    service_py-->>service_py: Perform moderation on English text
    service_py-->>router_py: Moderation Result
    deactivate service_py

    router_py-->>Client: JSON Response
    deactivate router_py


Loading

Model Support: The facebook/m2m100_418M model that has been integrated is a multilingual translation model that explicitly supports Japanese among the 100 languages it was trained on.

Language Detection: The [langdetect] library is used to automatically identify the language of the input text. When a user provides a prompt in Japanese, [langdetect] will identify its language code as ja.

Translation Process: The [ModelBasedTranslate] class uses this detected language code (ja) to set the source language for the tokenizer. It then instructs the model to translate the text into English (en), which is the language the moderation guardrails are designed to process.

Therefore, the pipeline is fully equipped to receive Japanese text, translate it to English, and then pass it to the moderation checks, fulfilling the requirements of the feature.

related issue
#21

…ace Transformers

- Removed dependency on Azure Translation API and Google Translate API.
- Integrated M2M100ForConditionalGeneration and M2M100Tokenizer for translation.
- Added language detection using langdetect library.
- Updated the translate method to handle text translation and logging.
- Improved error handling and fallback mechanism.
@hasnatelias hasnatelias changed the title Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation SUpport) Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation Support) Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant